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TEXT-TO-SPEECH STREAMING VIA A NETWORK 



FIELD OF THE INVENTION 

5 

The present invention relates to a method and a system for converting text messages into 
streaming audio data, as well for communicating streaming audio data over a network. 



10 



BACKGROUND OF THE INVENTION 



Nowadays, there is an increasing need for communicating audio and video data via networks. 
One of the requirements to be met by network users is the availability of multi-media 
15 applications at the user's access equipment. These multi-media applications include audio and 
video software that is used to play, retrieve, and create audio and video content. Needed as 
well is multi-media supporting equipment such as sound-cards, audio-cards, microphones and 
speakers. 

20 One of the developments in recent years with respect to audio applications is the availability of 
computer based techniques for converting text data into speech. With such techniques, text 
data is translated to audio information by text-to-speech conversion software. Examples of text- 
to-speech software include Apple Computer's Speech Manager and Digital Equipment 
Corporation's DECTalk. A text-to-speech engine generally comprises a text analyzer, a syntax 

25 and context analyzer and a synthesis module. Using a text-to-speech engine, users can convert 
text data into audio data on their own equipment such as a personal computer. Via an output 
device, such as a loud speaker, the audio content that is contained by the audio data can be 
heard or interpreted by a human being. 

30 Also well known nowadays are streaming techniques for retrieving audio data. As an example, 
streaming techniques are used for real-time radio on the Internet. Streaming audio refers to 
audio being played 'on the fly' as more audio data comes in. In other words, the receiving 
system, such as a personal computer, does not wait until the entire audio data input is received. 

35 As stated before, a necessity for users intending to create and distribute audio data is the 

availability of audio supporting equipment such as a microphone and a sound-card, and audio 
supporting software applications such as a media player. Another necessity is faced when a 
user intends to share with or communicate to other users audio data over a network. The user 
should then be acquainted with the steps to be taken for communicating the generated audio 

40 data and to send it to other users. This requirement can exclude users with no or relatively basic 
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know-how relating to multi-media applications from sharing audio data with other users. The 
situation is even more complex for a user if the audio data should be sent to another user, the 
audio data in association with other information such as but not limited to, an electronic 
document, a picture or a HTML page. 

5 

A possible way to create, post and retrieve audio data is known from United States Patent 
Application No. US 2002/0056351. According to this known method it is possible to post audio 
files to a centrally located server, and to associate audio files with documents. However this 
known method does not include text-to-speech facilities, a user still needs a device such as a 

10 personal computer that includes specific hardware and supporting software to create audio data 
such as a microphone and an audio card. As a consequence, a user should have the 
appropriate knowledge for using, installing, and configuring this type of hardware. Also, for 
purposes where it is more appropriate to convert text data, such as an electronic text document, 
into audio data the known method is not efficacious. This can be the case if a user is a disabled 

15 person not able to speak or use his or her voice in a proper way. This can also be the case if a 
user is in a public place while using an access device in order to send audio data to another 
user. In the latter case a user may prefer to convert a text message into audio data using a text- 
to-speech application instead of recording his own voice. Another drawback of the method 
known from United States Patent Application No. US 2002/0056351 is that it does not comprise 

20 the retrieving of streaming audio data by a user. 



AIM OF THE INVENTION 

25 

It is an object of the invention to eliminate the drawbacks of the prior art and to provide a 
method and a system that enables network-users to convert a first type of data into a second 
type of data without local conversion facilities, and to communicate the second type of data to 
other network-users where it is received as streaming data. 

30 



SUMMARY OF THE INVENTION 

35 In accordance of this invention, a method, a platform, and software are disclosed for converting 
a first type of data into a second type of data. The conversion of the first type of data takes 
place at a remote conversion server connected to a network accessible for a user. For this 
suroose, the method according to the present invention comprises the steps of: 



40 



selecting or entering by an originating network-user (17) the first type of data (10) , 
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associating an object with the first type of data (10), 

sending the first type of data (10) via a network (3) to a service platform (5), 

and thereafter converting the first type of data (10) into the second type of data (1 5) at 

the service platform (5). 



5 



This step of the method enables users to convert a first type of data, such as text data, into a 
second type of data, such as audio data or video data, without locally having available 
conversion facilities. In the case of text data to be converted into audio data, this invention 
solves the problem in the prior art that audio supporting equipment or text-to-speech facilities 
10 should be available locally. The text data can be a text message that is sent by a network user 
via a network to a server connected to the network. Additionally or optionally, the text data can 
be a part of an electronic text document or any other alphanumeric source. The network 
accessible for the user can be the Internet, or any type of public or private network. 

15 The method according to the invention can also include the step to send the second type of 
data as streaming data to another user via a server connected to the network. In the case of 
text data being converted into audio data, this means that, together with the remote text-to- 
speech facilities of the first step, upstream text data is received downstream as streaming audio 
data. To accomplish this, a user sends the text data and an identification of the addressed user 

20 over the network to interacting servers, database and other computer programs connected to 

the network. The interacting servers, databases and other computer programs process the input 
received from the user resulting in streaming audio data to be received by the addressed user. 
An identification code can be used to identify the text data. 

25 The method according to the invention can further include the step to associate the second type 
of data with a file or any other type of electronic document including, but not limited to, text 
documents, images and HTML documents. If the second type of data is audio data, it can be 
associated with a HTML document to help to interpreted what can be seen on the HTML 
document. If the second type of data is video data, it can be associated with a text document to 

30 visualize what can be found in the text document. A file can be selected by a user from a 

collection of files centrally available at a server connected to the network, or from a collection of 
files locally available at the access device of the user. 

With the functionality of the service platform (5), users are able to create and distribute a certain 
35 type of data without having available the required facilities for this locally. 
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The foregoing aspects and many of attendant advantages of this invention will become better 
understood by reference to the following detailed description, when taken in conjunction with the 
accompanying drawing, wherein: 

5 FIG. 1 is a block diagram illustrating the components involved if the first type of data is text data 
(10) and the second type of data is audio data (15). 



EXEMPLARY EMBODIMENTS 

10 

For the purpose of teaching of the invention, preferred embodiments of the method and devices 
of the invention are described in the sequel. It will be apparent to the person skilled in the art 
that other alternative and equivalent embodiments of the invention can be conceived and 
reduced to practice without departing form the true spirit of the invention, the scope of the 
15 invention being limited only by the appended claims as finally granted. 

FIG.1 shows an embodiment of the invention in the case of converting text data (10) into audio 
data (15). Referring to FIG. 1 there is a network (3) that connects network-users. The network 

(3) can be a fixed or mobile network. The network (3) may be a public network, such as the 
20 Internet, or a private network. The network may be a non-secure network or a network that is 

perceived as being non-secure, although secure networks are not excluded in relation to this 
invention. The network (3) can be facilitated by a service provider, such as an Internet service . 
provider, although network (3) also can be facilitated by an organization operating accessibility 
to remote sites for specific groups of customers. In the latter case the customer is able to 
25 access directly, i.e. without using the Internet, one or more remote locations. 

A server (4) is connected to the network (3). There may be many different servers (4), 
geographically or functionally separated from each other and each managed, controlled and 
exploited by different parties. The server (4) in the embodiment depicted in FIG.1 is a 
30 microprocessor-based system comprising a processing unit and a memory although many other 
features, facilities and components may be part of the server (4) too. In the memory of the 
server (4) are one or more application programs stored that execute on the CPU of the server 

(4) . The server (4) can be a system operating under UNIX, NT or any other related operating 
system. An application residing at the server (4) may be a computer program such as a WWW 

35 server, although the present invention does not exclude applications that are not related to 

internet technology. As an alternative for being accessible via the Internet, the server (4) can be 
part of a private domain accessible for a closed user group. In the latter case, the server (4) 
may be hosting IP based non-IP based applications and information. The server (4) and the 
applications residing on it may be operated and exploited by an electronic merchant. The server 

40 (4) and the service platform (5) may be located at the same physical location. 
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An originating network-user (17) is connected to the network (3). The originating network user 
(17) is a user that initiates the process of sending streaming audio data to a receiving network- 
user (18). The originating network-user (17) uses an originating access device (1) for accessing 
5 the network (3). The originating access device (1 ) is a device for accessing a mobile or fixed 
network, such as a telephone, a laptop or a personal computer. If the originating access device 
(1) is a telephone, it preferably is a touch-tone telephone that is able to send and receive short 
messages (SM's). An IP telephone may be used in connection to the present invention too. 
Wireless devices are also taken into account with regard to this invention, such as bluetooth 

10 supporting devices. The originating access device (1 ) may also be part of a local area network. 
Peripheral devices like a modem and a mouse are not shown. The originating access device (1) 
has limited or in some cases no facilities available for retrieving, playing, recording and sending 
audio data. Additionally, the originating network-user (17) could have a limited understanding of 
using or installing multi-media applications and hardware on the originating access device (1). 

15 So even if the appropriate multi-media applications and hardware are available on the 

originating access device (1), the originating network-user (17) may not able to retrieve, record, 
send are play audio data, because the originating network-user (17) is not familiar with the 
usage of these multi-media applications and hardware. The physical connection between the 
originating access device (1) and the network (3) can be through a modem and a telephone 

20 line, a networking device and a leased line, or any types of wireless connection means. The 
details of the type of connection between the originating access device (1 ) and the network (3) 
are of no consequence in the present invention. 

Again with reference to Fig. 1 the dashed line relates to the service platform (5). The service 
25 platform (5) can be operated and exploited by a service provider. The service platform consists 
of a number of entities, which are discussed hereafter. The entity where the conversion takes 
place of the text data (10) into audio data (15) is a TTS (text-to-speech) manager (6), which is a 
CGI (Common Gateway interface) program. The TTS manager (6) has access to a storage 
means (7). A media encoder (8) is connected to the TTS manager (6). The media encoder (8) is 
30 an application that generates one or more audio data streams simultaneously based on the 
input that is received from the TTS server (9). The TTS server (9) comprises software that 
converts text into audio data (15). The TTS manager (6), the media encoder (8) end the TTS 
server (9) may be hosted by one physical system or may be each be hosted by a separate 
physical system. Usually but not necessarily, the service platform (5) is protected against 
35 threats originating from the network (3) by means of a fire-wall (not shown). 

Referring to Fig. 1 the originating network-user (17) accesses the server (4) via the network (3). 
If the application on the server (4) is a website, the originating network-user (17) can invoke the 
TTS service through a HTML hyperlink. Access to the functionality of the TTS platform (5) is 
40 provided via a payment mechanism. The payment mechanism can be based upon the usage of 
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a credit card or it can be any other payment mechanism, for instance based on dialing a 0800 
telephone number. The originating network-user (17) can construct a text data (10) and send 
the text data (10) to the server (4). Creating the text data (10) can be done in many different 
ways. The text data (10) can be created by the originating network-user (17) by using a text 
5 editor, an e-mail program, a browser program or, in case the originating access device (1) is a 
telephone, simply by entering the text data (10) via a user-interface. A destination address (19) 
to identify the receiving network-user (18) is sent by the originating network-user (17) together 
with the text data (10) to the server (4). The destination address (19) can be an e-mail address 
or any type of identification number. The destination address (19) can be sent simultaneously 
10 along with the text data (10), or can be sent before or after sending the text data (10). 

Optionally or alternatively, the originating network-user (17) can associate an object with the 
text data (10). The object can be an image according to any type of format, such as but not 
limited to the JPEG or GIF format. The object can also be a video sequence according to any 
15 type of format, streaming or non-streaming, such as MPEG and VIVO. The object can also be 
an HTML document or any kind of file, including text documents and graphical files. It is 
emphasized that these examples are provided merely for illustration and not limitation. 

After the text data (10) is received by the server (4), the text data (10) will be sent to the TTS 
20 manager (6). In an embodiment according to this invention there can be sent a code (11) . 
together with the text data (10) to the TTS manager (6). This code (1 1) can be used to identify 
the server (4) that has sent the text data (10). Based on the code (11) accounting can take 
place between the service provider that operates the service platform (5) and the electronic 
merchant that operates the server (4). 

25 

After receiving the text data (10) and the code (1 1) the TTS manager (6) performs a validity 
check on the code (11). If the code (1 1) is valid, the TTS manager (6) stores the text data (10) 
in the storage means (7). The TTS manager (6) also generates an activation code (12) that is 
stored also in the storage means (7). The activation code (12) may be a unique code. The 
30 activation code (1 2) refers to the text data (10) via a link, pointer or any other mechanism to 
associate the text data (10) with the activation code (12). 

tts manager (6) sends a reference address, such as a URL (Universal Resource 
Location), with the activation code (12) as a parameter to an application, such as a web server, 
35 at the server (4). The reference address refers to the TTS manager (6), and is used to indicate 
the location of the TTS manager (6). If the systems described in this disclosure are based on IP 
related technology, the reference address represents an IP address. Alternatively, the reference 
address represents some other identification of a network entity or application. 
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At the server (4) a webpage is being created that contains the reference address to the TTS 
manager (6). The webpage also contains a media player thai can be started by the receiving 
network-user (18). The server (4) also sends an e-mail message (14) containing another 
reference address to the receiving network-user (18). The other reference address refers to the 
5 webpage being created by the server (4). After receiving the e-mail message (14), the receiving 
network-user (18) can access the webpage by selecting the reference address (or clicking the 
URL) received in the e-mail message (14). Having accessed the webpage, the receiving 
network-user (18) can start the media player resulting in sending the activation code (12) to the 
TTS manager (6) and consequently activating the TTS manager (6). 

10 

After having received the activation code (12), the TTS manager (6) checks the validity of the 
activation code (12). If the activation code (12) is valid, the TTS manager retrieves the 
corresponding text data (10) from the storage means (7). The TTS manager (6) sends the text 
data (10) to a TTS server (9), where the text data (10) is converted into an audio data (15). It is 

15 not necessarily to store the audio data (15) in the storage means (7), although in some other 
embodiments of the present invention it can be possible to store the audio data (15) before, 
being processed by a media encoder (8). Avoiding storing the audio data (15) in the storage 
means reduces the required memory capacity, and avoids costs relating to the usage of the 
software residing at the TTS server (9) like license fees. The audio data (15) is sent to the 

20 media encoder (8) resulting in an audio data stream (16). The audio data stream (16) can be 
sent to the receiving network-user (18), where the audio data stream (16) is played using the 
media player available for the receiving network-user (18). The end of the process can be 
determined using different techniques, such as the detection of a period of inactivity. 



